NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Principal stratification with continuous post-treatment variables: nonparametric identification and semiparametric estimation

https://doi.org/10.1093/jrsssb/qkaf049

Lu, Sizhu; Jiang, Zhichao; Ding, Peng (July 2025, Journal of the Royal Statistical Society Series B: Statistical Methodology)

Abstract Post-treatment variables often complicate causal inference. They appear in many scientific problems, including non-compliance, truncation by death, mediation, and surrogate endpoint evaluation. Principal stratification is a strategy to address these challenges by adjusting for the potential values of the post-treatment variables, defined as the principal strata. It allows for characterizing treatment effect heterogeneity across principal strata and unveiling the mechanism of the treatment’s impact on the outcome related to post-treatment variables. However, the existing literature has primarily focused on binary post-treatment variables, leaving the case with continuous post-treatment variables largely unexplored. This gap persists due to the complexity of infinitely many principal strata, which present challenges to both the identification and estimation of causal effects. We fill this gap by providing nonparametric identification and semiparametric estimation theory for principal stratification with continuous post-treatment variables. We propose to use working models to approximate the underlying causal effect surfaces and derive the efficient influence functions of the corresponding model parameters. Based on the theory, we construct doubly robust estimators and implement them in the R package continuousPCE.
more » « less
Identifying and bounding the probability of necessity for causes of effects with ordinal outcomes

https://doi.org/10.1093/biomet/asaf049

Zhang, Chao; Geng, Zhi; Li, Wei; Ding, Peng (July 2025, Biometrika)

Abstract Although the existing causal inference literature focuses on the forward-looking perspective by estimating effects of causes, the backward-looking perspective can provide insights into causes of effects. In backward-looking causal inference, the probability of necessity measures the probability that a certain event is caused by the treatment given the observed treatment and outcome. Most existing results focus on binary outcomes. Motivated by applications with ordinal outcomes, we propose a general definition of the probability of necessity. However, identifying the probability of necessity is challenging because it involves the joint distribution of the potential outcomes. We propose a novel assumption of monotonic incremental treatment effect to identify the probability of necessity with ordinal outcomes. We also discuss the testable implications of this key identification assumption. When it fails, we derive explicit formulas of the sharp large-sample bounds on the probability of necessity.
more » « less
Free, publicly-accessible full text available July 9, 2026
Forward selection and post-selection inference in factorial designs

https://doi.org/10.1214/24-AOS2454

Shi, Lei; Wang, Jingshen; Ding, Peng (April 2025, The Annals of Statistics)

Free, publicly-accessible full text available April 1, 2026
Mediation Analysis with the Mediator and Outcome Missing Not at Random

https://doi.org/10.1080/01621459.2024.2359132

Zuo, Shuozhi; Ghosh, Debashis; Ding, Peng; Yang, Fan (April 2025, Journal of the American Statistical Association)

Free, publicly-accessible full text available April 3, 2026
Two-phase rejective sampling and its asymptotic properties

https://doi.org/10.1093/jrsssb/qkaf002

Yang, Shu; Ding, Peng (February 2025, Journal of the Royal Statistical Society Series B: Statistical Methodology)

Abstract Rejective sampling improves design and estimation efficiency of single-phase sampling when auxiliary information in a finite population is available. When such auxiliary information is unavailable, we propose to use two-phase rejective sampling (TPRS), which involves measuring auxiliary variables for the sample of units in the first phase, followed by the implementation of rejective sampling for the outcome in the second phase. We explore the asymptotic design properties of double expansion and regression estimators under TPRS. We show that TPRS enhances the efficiency of the double-expansion estimator, rendering it comparable to a regression estimator. We further refine the design to accommodate varying importance of covariates and extend it to multi-phase sampling. We start with the theory for the population mean and then extend the theory to parameters defined by general estimating equations. Our asymptotic results for TPRS immediately cover the existing single-phase rejective sampling, under which the asymptotic theory has not been fully established.
more » « less
With random regressors, least squares inference is robust to correlated errors with unknown correlation structure

https://doi.org/10.1093/biomet/asae054

Zhang, Zifeng; Ding, Peng; Zhou, Wen; Wang, Haonan (January 2025, Biometrika)

Abstract Linear regression is arguably the most widely used statistical method. With fixed regressors and correlated errors, the conventional wisdom is to modify the variance-covariance estimator to accommodate the known correlation structure of the errors. We depart from existing literature by showing that with random regressors, linear regression inference is robust to correlated errors with unknown correlation structure. The existing theoretical analyses for linear regression are no longer valid because even the asymptotic normality of the least squares coefficients breaks down in this regime. We first prove the asymptotic normality of the t statistics by establishing their Berry–Esseen bounds based on a novel probabilistic analysis of self-normalized statistics. We then study the local power of the corresponding t tests and show that, perhaps surprisingly, error correlation can even enhance power in the regime of weak signals. Overall, our results show that linear regression is applicable more broadly than the conventional theory suggests, and they further demonstrate the value of randomization for ensuring robustness of inference.
more » « less
Full Text Available
MANGO: A Benchmark for Evaluating Mapping and Navigation Abilities of Large Language Models

Ding, Peng; Fang, Jiading; Li, Peng; Wang, Kangrui; Zhou, Xiaochen; Yu, Mo; Li, Jing; Mei, Hongyuan; Walter, Matthew (October 2024, Conference on Language Modeling (COLM))

Full Text Available
No star is good news: A unified look at rerandomization based on $p$ -values from covariate balance tests

https://doi.org/10.1016/j.jeconom.2024.105724

Zhao, Anqi; Ding, Peng (April 2024, Journal of Econometrics)

Full Text Available
Covariate adjustment in randomized experiments with missing outcomes and covariates

https://doi.org/10.1093/biomet/asae017

Zhao, Anqi; Ding, Peng; Li, Fan (March 2024, Biometrika)

Summary Covariate adjustment can improve precision in analysing randomized experiments. With fully observed data, regression adjustment and propensity score weighting are asymptotically equivalent in improving efficiency over unadjusted analysis. When some outcomes are missing, we consider combining these two adjustment methods with the inverse probability of observation weighting for handling missing outcomes, and show that the equivalence between the two methods breaks down. Regression adjustment no longer ensures efficiency gain over unadjusted analysis unless the true outcome model is linear in covariates or the outcomes are missing completely at random. Propensity score weighting, in contrast, still guarantees efficiency over unadjusted analysis, and including more covariates in adjustment never harms asymptotic efficiency. Moreover, we establish the value of using partially observed covariates to secure additional efficiency by the missingness indicator method, which imputes all missing covariates by zero and uses the union of the completed covariates and corresponding missingness indicators as the new, fully observed covariates. Based on these findings, we recommend using regression adjustment in combination with the missingness indicator method if the linear outcome model or missing-completely-at-random assumption is plausible and using propensity score weighting with the missingness indicator method otherwise.
more » « less
Full Text Available
Randomization Tests for Peer Effects in Group Formation Experiments

https://doi.org/10.3982/ECTA20134

Basse, Guillaume; Ding, Peng; Feller, Avi; Toulis, Panos (January 2024, Econometrica)

Measuring the effect of peers on individuals' outcomes is a challenging problem, in part because individuals often select peers who are similar in both observable and unobservable ways. Group formation experiments avoid this problem by randomly assigning individuals to groups and observing their responses; for example, do first‐year students have better grades when they are randomly assigned roommates who have stronger academic backgrounds? In this paper, we propose randomization‐based permutation tests for group formation experiments, extending classical Fisher Randomization Tests to this setting. The proposed tests are justified by the randomization itself, require relatively few assumptions, and are exact in finite samples. This approach can also complement existing strategies, such as linear‐in‐means models, by using a regression coefficient as the test statistic. We apply the proposed tests to two recent group formation experiments.
more » « less
Full Text Available

« Prev Next »

Search for: All records